Loading all libraries to be used in the time-series
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.2 ✓ purrr 0.3.4
✓ tibble 3.0.1 ✓ dplyr 1.0.0
✓ tidyr 1.1.0 ✓ stringr 1.4.0
✓ readr 1.3.1 ✓ forcats 0.5.0
package ‘readr’ was built under R version 4.0.2── Conflicts ───────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(tidyquant)
package ‘tidyquant’ was built under R version 4.0.2Loading required package: lubridate
package ‘lubridate’ was built under R version 4.0.2
Attaching package: ‘lubridate’
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
Loading required package: PerformanceAnalytics
Loading required package: xts
package ‘xts’ was built under R version 4.0.2Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
Attaching package: ‘xts’
The following objects are masked from ‘package:dplyr’:
first, last
Attaching package: ‘PerformanceAnalytics’
The following object is masked _by_ ‘.GlobalEnv’:
weights
The following object is masked from ‘package:graphics’:
legend
Loading required package: quantmod
package ‘quantmod’ was built under R version 4.0.2Loading required package: TTR
package ‘TTR’ was built under R version 4.0.2Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Version 0.4-0 included new data defaults. See ?getSymbols.
══ Need to Learn tidyquant? ═══════════════════════════════════════════════════════════
Business Science offers a 1-hour course - Learning Lab #9: Performance Analysis & Portfolio Optimization with tidyquant!
</> Learn more at: https://university.business-science.io/p/learning-labs-pro </>
library(modelr)
Attaching package: ‘modelr’
The following object is masked _by_ ‘.GlobalEnv’:
heights
library(gridExtra)
package ‘gridExtra’ was built under R version 4.0.2
Attaching package: ‘gridExtra’
The following object is masked from ‘package:dplyr’:
combine
library(grid)
library(ggplot2)
library(lubridate)
library(xts)
library(ggplot2)
library(dplyr)
library(plotly)
package ‘plotly’ was built under R version 4.0.2
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
library(hrbrthemes)
NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(dygraphs)
package ‘dygraphs’ was built under R version 4.0.2
library(htmlwidgets)
package ‘htmlwidgets’ was built under R version 4.0.2
Loading text time series data
global = read.delim('/users/rhome/globaltemp.txt', stringsAsFactors = FALSE, header = T, sep = ",")
display the columns names
names(global)
[1] "dt"
[2] "LandAverageTemperature"
[3] "LandAverageTemperatureUncertainty"
[4] "LandMaxTemperature"
[5] "LandMaxTemperatureUncertainty"
[6] "LandMinTemperature"
[7] "LandMinTemperatureUncertainty"
[8] "LandAndOceanAverageTemperature"
[9] "LandAndOceanAverageTemperatureUncertainty"
Check the class of the date column
#global$dt = as.Date(global$dt, format="%y-%m-%d")
class(global$dt)
[1] "character"
convert column to date class
dateOnly = as.Date(global$dt)
class(dateOnly)
[1] "Date"
Let’s check first five values of the head
head(dateOnly)
[1] "1750-01-01" "1750-02-01" "1750-03-01" "1750-04-01" "1750-05-01" "1750-06-01"
Let’s check last five values of the data
tail(dateOnly)
[1] "2015-07-01" "2015-08-01" "2015-09-01" "2015-10-01" "2015-11-01" "2015-12-01"
Just to overview the summary of each column
summary(global)
dt LandAverageTemperature LandAverageTemperatureUncertainty
Length:3192 Min. :-2.080 Min. :0.0340
Class :character 1st Qu.: 4.312 1st Qu.:0.1867
Mode :character Median : 8.611 Median :0.3920
Mean : 8.375 Mean :0.9385
3rd Qu.:12.548 3rd Qu.:1.4192
Max. :19.021 Max. :7.8800
NA's :12 NA's :12
LandMaxTemperature LandMaxTemperatureUncertainty LandMinTemperature
Min. : 5.90 Min. :0.0440 Min. :-5.407
1st Qu.:10.21 1st Qu.:0.1420 1st Qu.:-1.335
Median :14.76 Median :0.2520 Median : 2.950
Mean :14.35 Mean :0.4798 Mean : 2.744
3rd Qu.:18.45 3rd Qu.:0.5390 3rd Qu.: 6.779
Max. :21.32 Max. :4.3730 Max. : 9.715
NA's :1200 NA's :1200 NA's :1200
LandMinTemperatureUncertainty LandAndOceanAverageTemperature
Min. :0.0450 Min. :12.47
1st Qu.:0.1550 1st Qu.:14.05
Median :0.2790 Median :15.25
Mean :0.4318 Mean :15.21
3rd Qu.:0.4582 3rd Qu.:16.40
Max. :3.4980 Max. :17.61
NA's :1200 NA's :1200
LandAndOceanAverageTemperatureUncertainty
Min. :0.0420
1st Qu.:0.0630
Median :0.1220
Mean :0.1285
3rd Qu.:0.1510
Max. :0.4570
NA's :1200
Summary shows there are NA’s in the data. We are examining Land Average temperature, Land and ocean temperature and Land maximum temperature. Let’s discuss Land Average temperature; 12 NA’s are present minimum average temperature of the Land is -2.080 Maximum average temperature of the land is 19.021 Median is 8.611 Mean is 8.375, When mean and median values are close to each other including NA’s than it means the distribution is at large scale to ignore skewness because of NA’s to the data.Ist quadrant, median and third quadrant values are uniformly distributed by the factor 4. It means data in this column is at normal distribution. Below is distribution of the data in scatter plot.
plot(global$LandAverageTemperature)
par(mfrow=c(1,1),bg="lavender")
barplot(global$LandAverageTemperature,col="lightpink",ylim=c(0,20),ylab="temperature",xlab="Land average temperature distribution")
Above graph shows that data is uniformly distributed. We can limit the values to see pink color of the graph.
hist(global$LandAverageTemperature, col = "pink")
Below is interactive graphical representation of the Land average temperature
p1 <- global %>%
ggplot( aes(x=dateOnly, y=(LandAverageTemperature))) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("landAverageTemperature") +
theme_ipsum()
p1 = ggplotly(p1)
Removed 12 rows containing missing values (position_stack).
p1
Interactive graph shows that on 1761/07/01 was the recorded highest average temperature of 19.021. and minimum of the time period is on 1768-01-01 with recording of -2.080 temperature. Overall there is an increase in the land average temperature as it starts at 15.868 on 1750-07-01 , 14.492 on 1850-07-01, 14.140 on 1950-07-01, 15.051 on 2015-07-01. Average temperature remains in range of 14 to 15. I analyzed increase because minimum temperature never fall below negative after 1838 with -0.057 temperature, it means more summer and less winter time period and overall trend upwards.
Below is the prediction of the Land average temperature
global.timeseries = ts(global$LandAverageTemperature,start = c(1850-12-01, 2015-12-01),frequency = 3)
plot(global.timeseries)
Prediction shows that there is more increase in land average temperature trend, definitely not good for living creatures on earth. According to paris climate agreement if temperature rises above preindustrial level 1.5 degree celuis to 2, temperature of earth is most likely going to make life difficult for all the creatures, not only humans.
Let’s discuss about Land Maximum temperature Minimum temperature among all maximum temperatures recorded over the years is 5.90. and highest temperature among all the recorded highest temperatures is 21.32. 1200 NA’s , Median and mean are almost 14 that means distribution of the data is uniform. Let’s visualize;
plot(global$LandMaxTemperature)
par(mfrow=c(1,1),bg="cornsilk")
barplot(global$LandMaxTemperature,col="lightpink",ylim=c(0,20),ylab="temperature",xlab="Land max temperature distribution")
hist(global$LandMaxTemperature, col = "pink")
In visualization the data is widely distributed over five different uniform curves, due to high density of the data, let’s ignore NA’s. But we can see that data is skewed.
p <- global %>%
ggplot( aes(x=dateOnly, y=(LandMaxTemperature))) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("landMaxtemperature") +
theme_ipsum()
p = ggplotly(p)
Removed 1200 rows containing missing values (position_stack).
p
In interactive graph, highest peaks are on 1854-07-01, 20.426 and 1877-07-01 surprisingly same month and date, 20.733 and 1915-07-01 was 20.553 then end of 20 century, it started rising with frequent highest peaks on 1998-07-01 was 20.972, then 2002-07-01 was 21.199, 2011-07-01 was 21.320. Trend of max temperature is increasing too fast in 21st century. Most likely that is the reason of global warming and it’s consequences.
Just my own hypothesis, that lava inside core of the earth temperature is balanced by different layers temperature. Different earth layers temperature is balanced by ice and water balance on earth. Now glaciers are melting, most likely earth itself will show increase in temperature and plus Carbon dioxide traps and sun heat, most likely seems more dangerous situation than predicted. Let’s see the prediction;
global.timeseries3 = ts(global$LandMaxTemperature,start = c(1850-12-01, 2015-12-01),frequency = 3.2)
plot(global.timeseries3)
In prediction, there is an increasing trend over the next few centuries.
Let’s discuss the land and ocean temperature, LandAndOceanAverageTemperature There are 1200 NA’s in the data. Minimum land and ocean temperature - 12.47, Values of first quadrant, third and median almost distributed in range 14.05 to 16.40 and 15 median makes it normal distribution. Maximum land and ocean temperature is 17.61, quite noticeable. Let’s visualize the data;
plot(global$LandAndOceanAverageTemperature)
par(mfrow=c(1,1),bg="lavender")
barplot(global$LandAndOceanAverageTemperature,col="lightpink",ylim=c(0,20),ylab="temperature",xlab="Land and ocean average temperature distribution")
hist(global$LandAndOceanAverageTemperature, col = "pink")
Histogram shows two normal distribution curves.
p3 <- global %>%
ggplot( aes(x=dateOnly, y=(LandAndOceanAverageTemperature))) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("landMaxtemperature") +
theme_ipsum()
p3 = ggplotly(p3)
Removed 1200 rows containing missing values (position_stack).
p3
In interaction graph, in 1866-07-01 temp 17.060, 1941-07-01 temp 17.131, 1945-08-01 temp 17.106, 1951-08-01 temp 17.081, 1977-07-01 temp 17.047, 1983-08-01 temp 17.145, 1987-07-01 temp 17.296, 1995-07-01 temp 17.375, 1998-07-01 temp 17.609, 2001-07-01 temp 17.450, 2009-07-01 temp 17.578, 2015-07-01 temp 17.611.
It is increasing trend in land and ocean temperature.
global.timeseries2 = ts(global$LandAndOceanAverageTemperature,start = c(1850-12-01, 2015-12-01),frequency = 3)
plot(global.timeseries2)
In prediction temperature, going really high as compared to now. Not much clear graph, but it is overall prediction that is going up.
#new graph just for analysing the data more on land average temperature; taking only every 100th row to visualize;
new_maxLandAvrtemp = global.timeseries[seq(1, length(global.timeseries), 100)]
new_date = dateOnly[seq(1, length(dateOnly), 100)]
z=merge(new_date,new_maxLandAvrtemp, all = TRUE, by.new_date = "date", by.new_maxLandAvrtemp = "newmaxLandAvrtemp")
z
head(z)
tail(z)
plot(x, y,xlab = "date", ylab = "every 500 record of max land avergae temperature" ,type='line')
global.timeseries4 = ts(z$x,start = c(1850-12-01, 2015-12-01),frequency = 20)
plot(global.timeseries4)
p <- z %>%
ggplot( aes(x=x, y=(y))) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("new_landMaxtemperature") +
theme_ipsum()
p = ggplotly(p)
p
Above graph is more clear and it seems temperature over the years changes very slow and peaks are almost equal. Most likely this is the reason that the climate change is unnoticeable but as we have seen the trend it is happenening.